Elhuyar at Tweet-Norm 2013
نویسندگان
چکیده
This paper presents the system developed by Elhuyar for the TweetNorm evaluation campaign which consists of normalizing Spanish tweets to standard language. The normalization covers only the correction of certain Out Of Vocabulary (OOV) words, previously identified by the organizers. The developed system follows a two step strategy. First, candidates for each OOV word are generated by means of various methods dealing with the different error-sources: extension of usual abbreviations, correction of colloquial forms, correction of replication of characters, normalization of interjections, and correction of spelling errors by means of editdistance metrics. Next, the correct candidates are selected using a language model trained on correct Spanish text corpora. The system obtained a 68.3% accuracy on the development set, and 63.36% on the test set, being the 4th ranked system on the evaluation campaign.
منابع مشابه
The TALP-UPC Approach to Tweet-Norm 2013
This paper describes the methodology used by the TALP-UPC team for the SEPLN 2013 shared task of tweet normalization (Tweet-Norm). The system uses a set of modules that propose different corrections for each out-of-vocabulary word. The final correction is chosen by weighted voting according to each module accuracy.
متن کاملLexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-specific Edit Distances, and Language Models
We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.
متن کاملDLSI en Tweet-Norm 2013: Normalización de Tweets en Español
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. In this paper is described the participation in the Text Normalisation Workshop at the SEPLN conference (Tweet-nor...
متن کاملResource-based Lexical Approach to Tweet-Norm task
This paper proposes a resource-based lexical approach for addressing the TWEET-NORM task. The proposed system exposes a simple but extensible modular architecture in which each analysis module independently proposes correction candidates for each OOV word. Each one of these analysis modules tries to address a speci c problem and each one works in a very di erent way. The resources are used as t...
متن کاملIntroducción a la Tarea Compartida Tweet-Norm 2013: Normalización Léxica de Tuits en Español
An overview of the shared task is presented: description, corpora, annotation, preprocess, participant systems and results.
متن کامل